Jul04, 2024

Web Scraping in Golang with Colly

Sora Fujimoto

AI Solutions Architect

Web scraping is a method used to extract data from websites. In Golang, the Colly library is a popular tool for web scraping due to its simplicity and powerful features. This guide will take you through setting up a Golang project with Colly, building a basic scraper, handling complex data extraction scenarios, and optimizing your scrapers with concurrent requests.

Setting Up Your Golang Project

Before you begin, ensure you have Go installed on your system. Initialize your project and fetch the Colly package with these commands:

bash Copy

go mod init my_scraper
go get -u github.com/gocolly/colly

This sets up your project directory and installs the Colly package.

Redeem Your CapSolver Bonus Code

Boost your automation budget instantly!
Use bonus code CAPN when topping up your CapSolver account to get an extra 5% bonus on every recharge — with no limits.
Redeem it now in your CapSolver Dashboard
.

Building a Basic Scraper

Let's create a basic scraper to extract all links from a specific Wikipedia page.

Create a new file main.go and add the following code:

go Copy

package main

import (
    "fmt"
    "github.com/gocolly/colly"
)

func main() {
    c := colly.NewCollector(
        colly.AllowedDomains("en.wikipedia.org"),
    )

    c.OnHTML(".mw-parser-output", func(e *colly.HTMLElement) {
        links := e.ChildAttrs("a", "href")
        fmt.Println(links)
    })

    c.Visit("https://en.wikipedia.org/wiki/Web_scraping")
}

This code initializes a new Colly collector restricted to en.wikipedia.org, then sets up a callback to find and print all links within the .mw-parser-output div of the page.

Scraping Table Data

For more complex tasks like scraping table data and writing it to a CSV file, you can use the encoding/csv package in Go:

Extend main.go with the following code to scrape table data:

go Copy

package main

import (
    "encoding/csv"
    "log"
    "os"
    "github.com/gocolly/colly"
)

func main() {
    fName := "data.csv"
    file, err := os.Create(fName)
    if err != nil {
        log.Fatalf("Could not create file, err: %q", err)
        return
    }
    defer file.Close()

    writer := csv.NewWriter(file)
    defer writer.Flush()

    c := colly.NewCollector()

    c.OnHTML("table.wikitable", func(e *colly.HTMLElement) {
        e.ForEach("tr", func(_ int, row *colly.HTMLElement) {
            rowData := []string{}
            row.ForEach("td", func(_ int, cell *colly.HTMLElement) {
                rowData = append(rowData, cell.Text)
            })
            writer.Write(rowData)
        })
    })

    c.Visit("https://en.wikipedia.org/wiki/List_of_programming_languages")
}

This script scrapes table data from a Wikipedia page and writes it to data.csv.

Making Concurrent Requests

To speed up scraping, you can make concurrent requests using Go's goroutines. Here's how you can scrape multiple pages concurrently:

go Copy

package main

import (
    "fmt"
    "github.com/gocolly/colly"
    "sync"
)

func scrape(url string, wg *sync.WaitGroup) {
    defer wg.Done()
    
    c := colly.NewCollector()
    
    c.OnHTML("title", func(e *colly.HTMLElement) {
        fmt.Println("Title found:", e.Text)
    })
    
    c.Visit(url)
}

func main() {
    var wg sync.WaitGroup
    urls := []string{
        "https://en.wikipedia.org/wiki/Web_scraping",
        "https://en.wikipedia.org/wiki/Data_mining",
        "https://en.wikipedia.org/wiki/Screen_scraping",
    }

    for _, url := range urls {
        wg.Add(1)
        go scrape(url, &wg)
    }

    wg.Wait()
}

In this example, we define a scrape function that takes a URL and a wait group as arguments. The function initializes a Colly collector, sets up a callback to print the title of the page, and visits the URL. The main function creates a wait group, iterates over a list of URLs, and starts a goroutine for each URL to scrape concurrently.

By following these steps, you can build robust web scrapers in Golang using Colly, handle various scraping scenarios, and optimize performance with concurrent requests. For more detailed tutorials and advanced usage, check out resources on web scraping with Go and Colly.

Other Web Scraping Libraries for Go

In addition to Colly, there are several other excellent libraries for web scraping in Golang:

GoQuery: This library offers a syntax and feature set similar to jQuery, allowing you to perform web scraping operations with ease, much like you would in jQuery.
Ferret: A portable, extensible, and fast web scraping system designed to simplify data extraction from the web. Ferret focuses on data extraction using a unique declarative language.
Selenium: Known for its headless browser capabilities, Selenium is ideal for scraping dynamic content. While it doesn't have official support for Go, there is a port available that allows its use in Golang projects.

Conclusion

Web scraping is a powerful and essential skill for efficiently extracting data from websites. Using Golang and the Colly library, you can build robust scrapers that handle various data extraction scenarios, from collecting simple links to extracting complex table data and optimizing performance with concurrent requests.

In this guide, you learned how to:

Set up a Golang project with the Colly library.
Build a basic scraper to extract links from a webpage.
Handle more complex data extraction, such as scraping table data and writing it to a CSV file.
Optimize your scrapers by making concurrent requests.

By following these steps, you can create effective and efficient web scrapers in Golang, leveraging the simplicity and powerful features of Colly. For more advanced usage and detailed tutorials, explore additional resources on web scraping with Go and Colly.

FAQ

1. Is Colly suitable for beginners learning web scraping in Golang?

Yes. Colly is designed to be simple and beginner-friendly while still offering powerful features like DOM parsing, request handling, callbacks, and concurrency. Even new Go developers can quickly build a functional scraper with just a few lines of code.

2. Can Colly scrape structured content such as tables or lists?

Absolutely. Colly allows selection of specific HTML nodes and attributes, which makes it easy to extract tables, lists, links, and other structured elements. You can also store results directly into files like CSV or JSON using Go’s standard libraries.

3. How can I speed up my Colly web scraper?

You can use Go’s goroutines to process multiple pages in parallel. By launching scrapers concurrently and synchronizing them with a wait group, scraping performance increases significantly—especially for large datasets or multi-URL crawling tasks.

web scrapingApr 22, 2026

Rust Web Scraping Architecture for Scalable Data Extraction

Learn scalable Rust web scraping architecture with reqwest, scraper, async scraping, headless browser scraping, proxy rotation, and compliant CAPTCHA handling.

Lucas Mitchell

web scrapingApr 17, 2026

How to Scrape Job Listings Without Getting Blocked

Learn the best techniques to scrape job listings without getting blocked. Master Indeed scraping, Google Jobs API, and web scraping API with CapSolver.

Web Scraping in Golang with Colly

Sora Fujimoto

AI Solutions Architect

Setting Up Your Golang Project

Before you begin, ensure you have Go installed on your system. Initialize your project and fetch the Colly package with these commands:

bash Copy

go mod init my_scraper
go get -u github.com/gocolly/colly

This sets up your project directory and installs the Colly package.

Redeem Your CapSolver Bonus Code

Boost your automation budget instantly!
Use bonus code CAPN when topping up your CapSolver account to get an extra 5% bonus on every recharge — with no limits.
Redeem it now in your CapSolver Dashboard
.

Building a Basic Scraper

Let's create a basic scraper to extract all links from a specific Wikipedia page.

Create a new file main.go and add the following code:

go Copy

package main

import (
    "fmt"
    "github.com/gocolly/colly"
)

func main() {
    c := colly.NewCollector(
        colly.AllowedDomains("en.wikipedia.org"),
    )

    c.OnHTML(".mw-parser-output", func(e *colly.HTMLElement) {
        links := e.ChildAttrs("a", "href")
        fmt.Println(links)
    })

    c.Visit("https://en.wikipedia.org/wiki/Web_scraping")
}

This code initializes a new Colly collector restricted to en.wikipedia.org, then sets up a callback to find and print all links within the .mw-parser-output div of the page.

Scraping Table Data

For more complex tasks like scraping table data and writing it to a CSV file, you can use the encoding/csv package in Go:

Extend main.go with the following code to scrape table data:

go Copy

package main

import (
    "encoding/csv"
    "log"
    "os"
    "github.com/gocolly/colly"
)

func main() {
    fName := "data.csv"
    file, err := os.Create(fName)
    if err != nil {
        log.Fatalf("Could not create file, err: %q", err)
        return
    }
    defer file.Close()

    writer := csv.NewWriter(file)
    defer writer.Flush()

    c := colly.NewCollector()

    c.OnHTML("table.wikitable", func(e *colly.HTMLElement) {
        e.ForEach("tr", func(_ int, row *colly.HTMLElement) {
            rowData := []string{}
            row.ForEach("td", func(_ int, cell *colly.HTMLElement) {
                rowData = append(rowData, cell.Text)
            })
            writer.Write(rowData)
        })
    })

    c.Visit("https://en.wikipedia.org/wiki/List_of_programming_languages")
}

This script scrapes table data from a Wikipedia page and writes it to data.csv.

Making Concurrent Requests

To speed up scraping, you can make concurrent requests using Go's goroutines. Here's how you can scrape multiple pages concurrently:

go Copy

package main

import (
    "fmt"
    "github.com/gocolly/colly"
    "sync"
)

func scrape(url string, wg *sync.WaitGroup) {
    defer wg.Done()
    
    c := colly.NewCollector()
    
    c.OnHTML("title", func(e *colly.HTMLElement) {
        fmt.Println("Title found:", e.Text)
    })
    
    c.Visit(url)
}

func main() {
    var wg sync.WaitGroup
    urls := []string{
        "https://en.wikipedia.org/wiki/Web_scraping",
        "https://en.wikipedia.org/wiki/Data_mining",
        "https://en.wikipedia.org/wiki/Screen_scraping",
    }

    for _, url := range urls {
        wg.Add(1)
        go scrape(url, &wg)
    }

    wg.Wait()
}

Other Web Scraping Libraries for Go

In addition to Colly, there are several other excellent libraries for web scraping in Golang:

GoQuery: This library offers a syntax and feature set similar to jQuery, allowing you to perform web scraping operations with ease, much like you would in jQuery.
Ferret: A portable, extensible, and fast web scraping system designed to simplify data extraction from the web. Ferret focuses on data extraction using a unique declarative language.
Selenium: Known for its headless browser capabilities, Selenium is ideal for scraping dynamic content. While it doesn't have official support for Go, there is a port available that allows its use in Golang projects.

Conclusion

In this guide, you learned how to:

Set up a Golang project with the Colly library.
Build a basic scraper to extract links from a webpage.
Handle more complex data extraction, such as scraping table data and writing it to a CSV file.
Optimize your scrapers by making concurrent requests.

Web Scraping in Golang with Colly

Setting Up Your Golang Project

Redeem Your CapSolver Bonus Code

Building a Basic Scraper

Scraping Table Data

Making Concurrent Requests

Other Web Scraping Libraries for Go

Conclusion

FAQ

1. Is Colly suitable for beginners learning web scraping in Golang?

2. Can Colly scrape structured content such as tables or lists?

3. How can I speed up my Colly web scraper?

More

Rust Web Scraping Architecture for Scalable Data Extraction

How to Scrape Job Listings Without Getting Blocked

Web Scraping in Golang with Colly

Setting Up Your Golang Project

Redeem Your CapSolver Bonus Code

Building a Basic Scraper

Scraping Table Data

Making Concurrent Requests

Other Web Scraping Libraries for Go

Conclusion

FAQ

1. Is Colly suitable for beginners learning web scraping in Golang?

2. Can Colly scrape structured content such as tables or lists?

3. How can I speed up my Colly web scraper?

More

Rust Web Scraping Architecture for Scalable Data Extraction

How to Scrape Job Listings Without Getting Blocked

Why Chrome Blocks Websites: Security vs. Automation Access Explained

NODRIVER vs Traditional Browser Automation Tools for Web Scraping